Production and use of phosphoethanolamine cellulose and derivatives

ABSTRACT

Phosphoethanolamine cellulose and methods of making and using it are disclosed. In particular, the invention relates to a method of producing a phosphoethanolamine cellulose biosynthetically using a BcsG phosphoethanolamine transferase for cellulose modification. Recombinant constructs encoding BcsG are described, including constructs encoding BcsG by itself or in combination with BcsE and BcsF, which increase the extent of cellulose modification and the amount of modified cellulose produced. Production of phosphoethanolamine cellulose in cell culture and derivatization of phosphoethanolamine cellulose are also described.

TECHNICAL FIELD

The present invention relates to modified celluloses and their use. Inparticular, the invention relates to compositions and methods ofproducing phosphoethanolamine cellulose and derivatives thereof.

BACKGROUND

Cellulose is the most abundant biopolymer on earth. Plants rely on thetensile strength and mechanical properties of cellulose to stand tall(Klemm et al. (2005) Angew Chem. Int. Ed. 44, 3358-3393). Chemically,cellulose is a linear polysaccharide composed of β-1,4 linked glucose.Individual strands participate in strong hydrogen bonding networks withneighboring strands and contribute to the physical and chemicalintegrity of plant cell walls and cellulosic materials (Nishiyama et al.(2003) J. Am. Chem. Soc. 125, 14300-14306). Microorganisms are alsomajor producers of cellulose (Romling et al. (2015) Trends Microbiol.23, 545-557).

The essential genetic and protein machinery for cellulose production inbacteria includes the cellulose synthase genes, termed bcsA and bcsB,encoding cellulose synthase subunits, BcsA and BcsB (Omadjela et al.(2013) Proc. Natl. Acad. Sci. USA 110, 17856-17861). BcsA is an integralmembrane protein containing the catalytic active site. BcsB interactswith BcsA at the periplasmic face of the inner membrane in Gram-negativebacteria, with the two subunits forming a channel for co-syntheticsecretion of cellulose. Cellulose biosynthesis requires activation bythe ubiquitous bacterial second messenger cyclic di-GMP (Jenal et al.(2017) Nat. Rev. Microbiol. 15, 271-284) which directly binds to BcsA(Morgan et al. (2014) Nat. Struct. Mol. Biol 21, 489-496). Intensecuriosity has emerged in understanding the diversity of additional genesin cellulose biosynthesis operons present in many microorganisms(Romling et al., supra).

SUMMARY

The invention relates to phosphoethanolamine cellulose and its use aswell as methods of producing phosphoethanolamine cellulose andderivatives thereof.

In one aspect, the invention includes a cellulose-producing host cellcomprising a recombinant polynucleotide encoding a BcsGphosphoethanolamine transferase operably linked to a promoter.

In certain embodiments, the recombinant polynucleotide is provided by aplasmid or viral vector.

In other embodiments, the recombinant nucleic acid is integrated intothe host cell genome.

In another embodiment, the host cell further comprises a recombinantpolynucleotide comprising a BcsE gene and/or a BcsF gene operably linkedto a promoter.

In another embodiment, the recombinant polynucleotide comprises amulticistronic vector expressing BcsG, BcsE, and BcsF. Themulticistronic vector may comprise, for example, a polynucleotideencoding an internal ribosome entry site (IBES) or a T2A peptide.

In another embodiment, the recombinant polynucleotide comprises a bcsEFGoperon.

In certain embodiments, the host cell is a bacterial cell, a plant cell,or an algae cell. For example, the cellulose-producing host cell may bea Gram-negative bacterium. Exemplary Gram-negative bacteria includethose belonging to the Acetobacter (e.g., Acetobacter xylinum),Agrobacterium, Escherichia (e.g., Escherichia coli), and Salmonella(e.g., Salmonella enterica) genuses.

In another embodiment, cellulose production is upregulated by cyclicdi-GMP, for example, by adding cyclic di-GMP to the cell.

In another embodiment, the cellulose-producing host cell furthercomprises a recombinant polynucleotide comprising a promoter operablylinked to a polynucleotide encoding diguanylate cyclase. The promotermay be an inducible promoter. The recombinant polynucleotide may beprovided by a vector such as a plasmid or viral vector.

In another aspect, the invention includes a method of producing aphosphoethanolamine cellulose, the method comprising: a) culturing acellulose-producing host cell comprising a recombinant polynucleotideencoding a BcsG phosphoethanolamine transferase operably linked to apromoter under conditions suitable for expression of the BcsGphosphoethanolamine transferase, wherein the phosphoethanolaminecellulose is produced; and b) isolating the phosphoethanolaminecellulose.

Media may be supplied with a continuous or batch fed system. In certainembodiments, culturing is performed in a growth media comprising one ormore carbon sources selected from the group consisting of glucose,fructose, acetate, or glycerol. In certain embodiments, culturing isperformed at a temperature below 30° C. For example, culturing may beperformed at a temperature in a range from about 25° C. to about 29° C.,or any temperature in between, such as 25° C., 26° C., 27° C., 28° C.,or 29° C. In other embodiments, culturing may be performed at atemperature in a range from about 30° C. to about 37° C., or anytemperature in between, such as 31° C., 32° C., 33° C., 34° C., 35° C.,or 36° C.

In another embodiment, the method further comprises increasing celluloseproduction by contacting the cellulose-producing host cell with cyclicdi-GMP.

In another embodiment, the method further comprises transfecting thecellulose-producing host cell with a recombinant polynucleotidecomprising a promoter operably linked to a polynucleotide encodingdiguanylate cyclase. The promoter may be an inducible promoter. Therecombinant polynucleotide may be provided by a vector such as a plasmidor viral vector.

In another aspect, the invention includes a composition comprising aphosphoethanolamine cellulose ester, wherein at least one hydroxyl groupof the phosphoethanolamine cellulose is esterified, for example, with anorganic acid, acid anhydride, or acid chloride, or an inorganic acid.Exemplary organic acids include acetic acid, propanoic acid, and butyricacid. Exemplary inorganic acids include nitric acid and sulfuric acid.

In another aspect, the invention includes a composition comprising aphosphoethanolamine cellulose ether, wherein at least one hydroxyl groupof the phosphoethanolamine cellulose is etherified. In certainembodiments, the phosphoethanolamine cellulose ether is an alkyl ether,a hydroxyalkyl ether, or a carboxyalkyl ether.

In another aspect, the invention includes a composition comprising aphosphoethanolamine cellulose, wherein at least one amine group ischemically modified. For example, at least one amine group may bealkylated, acylated, or sulfonated. In another embodiment, at least oneamine group is conjugated to an agent. In certain embodiments, the agentis a peptide, antibody, enzyme, nucleic acid, dye, ligand, or drug.

In another aspect, the invention includes a method of hydrolyzing aphosphoethanolamine cellulose, the method comprising contacting thephosphoethanolamine cellulose with one or more cellulases. For example,endocellulases, exocellulases, beta-glucosidases, oxidative cellulases,cellulose phosphorylases, or a combination thereof may be used inhydrolysis of a phosphoethanolamine cellulose.

These and other embodiments of the subject invention will readily occurto those of skill in the art in view of the disclosure herein.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1A-1D show that E. coli produces phosphoethanolamine cellulose.FIG. 1A shows the chemical structure representation of glucose and pEtNglucose units in pEtN cellulose. FIG. 1B shows the ¹³C CPMAS solid-stateNMR spectra of the pure cellulosic material isolated without the use ofCR in the growth medium (top) and with CR (bottom). The pure CR spectrumis provided as an overlay (dashed gray line). The comparisondemonstrates that purification with CR does not influence thepolysaccharide composition. FIG. 1C that the C6′ and C7 carbon chemicalshift region exhibited the strongest dephasing in the 1-ms C{P} REDORNMR measurement, followed next by the C5′ and C8 carbons. FIG. 1D showsthe ¹³C CPMAS spectrum of the cellulosic material isolated from the bcsGderivative lacked modification carbons and contained only the ¹³Cchemical shifts expected for standard amorphous cellulose.

FIGS. 2A-2C show phosphoethanolamine cellulose production is detected incurli-integrated E. coli biofilm matrices with isotopic serine labelingand is also produced by Salmonella enterica. FIG. 2A shows isotopiclabeling with L-[3-¹³C]serine-supplemented YESCA nutrient mediumresulted in enrichment of the pEtN cellulose C7 carbon in an isolatedpEtN sample, consistent with routing through a possible substrate suchas phosphatidylethanolamine. FIG. 2B shows isotopic labeling withL-[¹⁵N]serine was evaluated by ¹⁵N CPMAS NMR on ECM samples containingboth curli and cellulosic material. Amide ¹⁵N signals correspond tocurli amides. The loss of the amine ¹⁵N signal in the bcsG derivativeassigned the amine nitrogen to pEtN cellulose. Loss of the modificationwas accompanied by loss of the wrinkled macrocolony morphology (insetphotographs). FIG. 2C shows the ¹³C CPMAS spectrum of the cellulosicmaterial isolated from Salmonella enterica serovar Typhimurium strainIR715ΔcsgBA matched that of pEtN cellulose from AR3110ΔcsgBA.

FIG. 3 shows that the phosphoethanolamine cellulose spectrum fromAR3110ΔcsgBA is nearly identical to that from the uropathogenic E. coliderivative UTI89ΔcsgA. 125-MHz ¹³C CPMAS NMR spectra were obtained forthe modified cellulose samples obtained from cells grown onCR-supplemented YESCA agar. Magic-angle spinning was performed at 7143Hz.

FIG. 4 shows that C{N}REDOR NMR confirmed that the C8 carbon is adjacentto a nitrogen in the AR3110ΔcsgBA modified cellulose. 125-MHz C{N}REDORwas performed with an evolution time of 2.2 ms to identify carbonsdirectly bonded to nitrogen. The REDOR difference (DS) spectrumconfirmed that the C8 carbon at 41 ppm exhibited complete dephasing asexpected for directly bonded CN pairs. Magic-angle spinning wasperformed at 7143 Hz.

FIG. 5 shows that ³¹P CPMAS NMR revealed the presence of phosphorous inthe modified cellulose. 202-MHz ³¹P CPMAS NMR was performed to identifythe presence of phosphorous in the modified cellulose isolated fromAR3110ΔcsgBA. The ³¹P centerband is centered at 1.2 ppm, referenced tocalf thymus DNA, consistent with a phosphate species. Spinning sidebandsare observed at multiples of the spinning frequency. Magic-anglespinning was performed at 8000 Hz.

FIGS. 6A and 6B show that solution-state NMR of the intact pEtNcellulose revealed observable ¹H signals consistent with aphosphoethanolamine modification. ¹H NMR resonances for the ¹H'sattached to C1 and C2 carbons in pEtN cellulose (FIG. 6B) exhibit thesplitting patterns expected for phosphoethanolamine (FIG. 6A). ¹H NMRwas performed at 600 MHz on a Varian solution-state NMR spectrometer.

FIGS. 7A-7D show solution-state ¹³C NMR spectral comparisons assigningthe contributions to the acid-digested pEtN cellulose ¹³C spectrum.150-MHz ¹³C solution-state NMR spectra of ethanolamine (FIG. 7A),glucose (FIG. 7B), and glucose-6-phosphate (FIG. 7C) were obtained toprovide spectral comparisons with acid-digested pEtN cellulose (FIG.7D). Acid digestion of pEtN cellulose resulted in cleavage of thecellulosic polymer and loss of ethanolamine with carbon signalsconsistent with glucose-6-phosphate and ethanolamine in solution.Spectra were referenced to DSS.

FIGS. 8A-8D show solution-state ¹H NMR spectral comparisons assigningthe dominant contributions to the acid-digested pEtN cellulose ¹³Cspectrum. 600-MHz ¹H solution-state NMR spectra of ethanolamine (FIG.8A), glucose (FIG. 8B), and glucose-6-phosphate (FIG. 8C) were obtainedto provide spectral comparisons with acid-digested pEtN cellulose (FIG.8D). Acid digestion of pEtN cellulose resulted in cleavage of thecellulosic polymer and loss of ethanolamine with carbon signalsconsistent with glucose-6-phosphate and ethanolamine in solution.

FIG. 9 shows ¹H-¹H COSY NMR. The solution-state ¹H-¹H COSY NMR spectrumof the acid-digested pEtN cellulose indicates the presence ofglucose-6-phosphate, ethanolamine, and a minor amount of glucose. Thespectrum was referenced to DSS.

FIG. 10 shows ¹H-¹³C HSQC NMR. The solution-state ¹H-¹³C HSQC NMRspectrum of the acid-digested pEtN cellulose indicates the presence ofglucose-6-phosphate, ethanolamine, and a minor amount of glucose. Thespectrum was referenced to DSS.

FIG. 11 shows that quantitative CP measurements yield a 1.9:1 ratio forthe C1:C8 carbon peaks in pEtN cellulose. 125-MHz ¹³C CPMAS NMR wasperformed on the pure pEtN cellulose sample shown in FIG. 1B (withoutthe inclusion of CR) as a function of the contact time to extract themaximum magnetization through CP transfer without relaxation. The C1 andC8 carbons are the most well resolved and well suited for thequantitative CP measurements. CP behavior, with decay due to T1p, wassimilar for the C1 and C8 carbons and yielded a C1:C8 ratio of 1.9:1.Thus, approximately one half of the glucose units are modified with thephosphoethanolamine group. Magic-angle spinning was performed at 7143Hz.

FIG. 12 shows that complementation with bcsG in the AR3110ΔcsgBAΔbcsGderivative restores the modification to pEtN cellulose. A 125-MHz ¹³CCPMAS NMR spectral comparison revealed that complementation of bcsG inthe AR3110ΔcsgBAΔbcsG derivative restored the phosphoethanolaminemodification to the same extent as present in AR3110ΔcsgBA. Cellulosicmaterial was prepared from bacteria grown on CR-supplemented agar as inFIG. 1B. Magic-angle spinning was performed at 7143 Hz.

FIGS. 13A-13D show that the bcsE and bcsF genes contribute to the extentof modification of pEtN cellulose. Cellulosic material was prepared fromAR3110ΔcsgBAΔbcsE (FIG. 13B) and AR3110ΔcsgBAΔbcsF (FIG. 13C) grown onCR-supplemented agar as in FIG. 1B. Results were compared withAR3110ΔcsgBA (FIG. 13A) and AR3110ΔcsgBAΔbcsG (FIG. 13D) from FIG. 1.Magic-angle spinning was performed at 7143 Hz.

FIG. 14 shows that mutations in bcsE, bcsF and bcsG affect macroscopicmorphology of macrocolonies. Macrocolonies of the cellulose-free strainW3110 strain and AR3110, which produces both cellulose and curli fibersand their indicated bcsE, bcsF and bcsG derivatives were grown for 48hours on YESCA agar plates.

FIG. 15 shows the production of unmodified cellulose by E. coli lackingthe csgG gene. The AR310ΔcsgBAΔbcsG mutant made significant quantitiesof unmodified cellulose when cells were grown on nutrient agar medium.Congo red served as an indicator of cellulose production.

FIG. 16 shows the solubility of phosphoethanolamine cellulose in water.The solubility of purified phosphoethanolamine from E. coli was comparedwith commercially available crystalline cellulose and commerciallyavailable carboxymethyl cellulose (produced chemically), the latterknown to be soluble in water and highly digestible by cellulases.

FIG. 17 shows that that phosphoethanolamine cellulose, produced by E.coli, is more digestible by cellulase than commercial crystallinecellulose. Aspergillus niger cellulase was used for enzymatic hydrolysisof phosphoethanolamine cellulose, crystalline cellulose, andcommercially available carboxymethyl cellulose. Glucose was detectedwith a standard hexokinase assay.

FIG. 18 shows a bacterial biofilm pellicle assay of wild-type and bcsGmutant cells. The normal wild-type E. coli extracellular matrix (ECM),which contains the modified pEtN cellulose and curli allows an overallhydrophobic material composed of cells and the ECM to assemble and bemaintained at the air liquid interface. This function is lost withoutthe pEtN modification. The bcsG mutant cellulose (unmodified) with curliis different. Although the bcsG mutant forms a type of mesh, it sinks tothe bottom of the dish and does not provide a strong network. Thus, themodified cellulose is able to promote the formation of a morehydrophobic material at the air-liquid interface(hydrophobic-hydrophilic interface).

DETAILED DESCRIPTION

The practice of the present invention will employ, unless otherwiseindicated, conventional methods of chemistry, biology, biochemistry, andmolecular biology and recombinant DNA techniques within the skill of theart. Such techniques are explained fully in the literature. See, e.g.,J. Wertz, J. P. Mercier, and O. Bedue Cellulose Science and Technology(Fundamental Sciences: Chemistry, EPFL Press, 2010); T. WuestenbergCellulose and Cellulose Derivatives in the Food Industry: Fundamentalsand Applications (Wiley-VCH, 2014); Lehninger, Biochemistry (WorthPublishers, Inc., current addition); Sambrook, et al., MolecularCloning: A Laboratory Manual (3^(rd) Edition, 2001); Methods InEnzymology (S. Colowick and N. Kaplan eds., Academic Press, Inc.).

All publications, patents and patent applications cited herein, whethersupra or infra, are hereby incorporated by reference in theirentireties.

I. Definitions

In describing the present invention, the following terms will beemployed, and are intended to be defined as indicated below.

It must be noted that, as used in this specification and the appendedclaims, the singular forms “a,” “an” and “the” include plural referentsunless the content clearly dictates otherwise. Thus, for example,reference to “a cell” includes two or more cells, and the like.

The term “about,” particularly in reference to a given quantity, ismeant to encompass deviations of plus or minus five percent.

The terms “polypeptide” and “protein” refer to a polymer of amino acidresidues and are not limited to a minimum length. Thus, peptides,oligopeptides, dimers, multimers, and the like, are included within thedefinition. Both full length proteins and fragments thereof areencompassed by the definition. The terms also include postexpressionmodifications of the polypeptide, for example, glycosylation,acetylation, phosphorylation, hydroxylation, and the like. Furthermore,for purposes of the present invention, a “polypeptide” refers to aprotein which includes modifications, such as deletions, additions andsubstitutions to the native sequence, so long as the protein maintainsthe desired activity. These modifications may be deliberate, as throughsite directed mutagenesis, or may be accidental, such as throughmutations of hosts which produce the proteins or errors due to PCRamplification.

The term “BcsG phosphoethanolamine transferase” as used hereinencompasses BcsG encoded phosphoethanolamine transferases from anybacterial species, and also includes biologically active fragments,variants, analogs, and derivatives thereof that retain BcsGphosphoethanolamine transferase activity (i.e., catalyze transfer of aphosphoethanolamine group to a cellulose hydroxyl group to produce aphosphoethanolamine-modified cellulose).

A BcsG polynucleotide, nucleic acid, oligonucleotide, protein,polypeptide, or peptide refers to a molecule derived from any source.The molecule need not be physically derived from an organism, but may besynthetically or recombinantly produced. BcsG sequences from a number ofbacterial species are well known in the art. Representative sequencesare presented for BcsG from Escherichia coli (SEQ ID NO:12) andSalmonella enterica (SEQ ID NO:13), and additional representativesequences are listed in the National Center for BiotechnologyInformation (NCBI) database. See, for example, NCBI entries: NP_417995,NC_000913, YP_002414689, WP_049093031, WP_049185400, WP_049125481,WP_049016646, WP_032425312, WP_057517249, WP_054376814, WP_050272434,WP_050967202, WP_050950257, WP_020978599, WP_020937258, WP_000192030,WP_001541082, WP_088744589, WP_085416812, WP_085347572, WP_052994086,WP_052992671, WP_052992608, WP_052982079, WP_052973396, YP_001008219,YP_206845, NP 744776, WP_048206881, WP_023291299, WP_060082415,WP_049591480, WP_049325226, WP_049300323, WP_049267920, andWP_049217448; all of which sequences (as entered by the date of filingof this application) are herein incorporated by reference. Any of thesesequences or a variant thereof comprising a sequence having at leastabout 70-100% sequence identity thereto, including any percent identitywithin this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,or 99% sequence identity thereto, can be used for cellulosemodification, as described herein, wherein the variant retainsbiological activity, such as BcsG phosphoethanolamine transferaseactivity.

By “fragment” is intended a molecule consisting of only a part of theintact full-length sequence and structure. The fragment can include aC-terminal deletion an N-terminal deletion, and/or an internal deletionof the polypeptide. Active fragments of a particular protein orpolypeptide will generally include at least about 5-10 contiguous aminoacid residues of the full length molecule, preferably at least about15-25 contiguous amino acid residues of the full length molecule, andmost preferably at least about 20-50 or more contiguous amino acidresidues of the full length molecule, or any integer between 5 aminoacids and the full length sequence, provided that the fragment inquestion retains biological activity, such as BcsG phosphoethanolaminetransferase activity.

“Substantially purified” generally refers to isolation of a substance(compound, cellulose or modified cellulose, oligosaccharide,monosaccharide, disaccharide, polysaccharide, polynucleotide, nucleicacid, protein, polypeptide, or peptide) such that the substancecomprises the majority percent of the sample in which it resides.Typically in a sample, a substantially purified component comprises 50%,preferably 80%-85%, more preferably 90-95% of the sample. Techniques forpurifying cellulose, saccharides, polynucleotides, and polypeptides ofinterest are well-known in the art and include, for example,ion-exchange chromatography, affinity chromatography and sedimentationaccording to density.

By “isolated” is meant, when referring to a cellulose or modifiedcellulose, oligosaccharide, monosaccharide, disaccharide,polysaccharide, or polypeptide, that the indicated molecule is separateand discrete from the whole organism with which the molecule is found innature or is present in the substantial absence of other biologicalmacromolecules of the same type. The term “isolated” with respect to apolynucleotide is a nucleic acid molecule devoid, in whole or part, ofsequences normally associated with it in nature; or a sequence, as itexists in nature, but having heterologous sequences in associationtherewith; or a molecule disassociated from the chromosome.

The terms “polynucleotide,” “oligonucleotide,” “nucleic acid” and“nucleic acid molecule” are used herein to include a polymeric form ofnucleotides of any length, either ribonucleotides ordeoxyribonucleotides. This term refers only to the primary structure ofthe molecule. Thus, the term includes triple-, double- andsingle-stranded DNA, as well as triple-, double- and single-strandedRNA. It also includes modifications, such as by methylation and/or bycapping, and unmodified forms of the polynucleotide. More particularly,the terms “polynucleotide,” “oligonucleotide,” “nucleic acid” and“nucleic acid molecule” include polydeoxyribonucleotides (containing2-deoxy-D-ribose), polyribonucleotides (containing D-ribose), any othertype of polynucleotide which is an N- or C-glycoside of a purine orpyrimidine base, and other polymers containing nonnucleotidic backbones,for example, polyamide (e.g., peptide nucleic acids (PNAs)) andpolymorpholino (commercially available from the Anti-Virals, Inc.,Corvallis, Oreg., as Neugene) polymers, and other syntheticsequence-specific nucleic acid polymers providing that the polymerscontain nucleobases in a configuration which allows for base pairing andbase stacking, such as is found in DNA and RNA. There is no intendeddistinction in length between the terms “polynucleotide,”“oligonucleotide,” “nucleic acid” and “nucleic acid molecule,” and theseterms will be used interchangeably. Thus, these terms include, forexample, 3′-deoxy-2′,5′-DNA, oligodeoxyribonucleotide N3′ P5′phosphoramidates, 2′-O-alkyl-substituted RNA, double- andsingle-stranded DNA, as well as double- and single-stranded RNA,microRNA, DNA:RNA hybrids, and hybrids between PNAs and DNA or RNA, andalso include known types of modifications, for example, labels which areknown in the art, methylation, “caps,” substitution of one or more ofthe naturally occurring nucleotides with an analog (e.g.,2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyladenosine, C5-propynylcytidine, C5-propynyluridine, C5-bromouridine,C5-fluorouridine, C5-iodouridine, C5-methylcytidine, 7-deazaadenosine,7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O(6)-methylguanine,and 2-thiocytidine), internucleotide modifications such as, for example,those with uncharged linkages (e.g., methyl phosphonates,phosphotriesters, phosphoramidates, carbamates, etc.), with negativelycharged linkages (e.g., phosphorothioates, phosphorodithioates, etc.),and with positively charged linkages (e.g., aminoalklyphosphoramidates,aminoalkylphosphotriesters), those containing pendant moieties, such as,for example, proteins (including nucleases, toxins, antibodies, signalpeptides, poly-L-lysine, etc.), those with intercalators (e.g.,acridine, psoralen, etc.), those containing chelators (e.g., metals,radioactive metals, boron, oxidative metals, etc.), those containingalkylators, those with modified linkages (e.g., alpha anomeric nucleicacids, etc.), as well as unmodified forms of the polynucleotide oroligonucleotide. The term also includes locked nucleic acids (e.g.,comprising a ribonucleotide that has a methylene bridge between the2′-oxygen atom and the 4′-carbon atom). See, for example, Kurreck et al.(2002) Nucleic Acids Res. 30: 1911-1918; Elayadi et al. (2001) Curr.Opinion Invest. Drugs 2: 558-561; Orum et al. (2001) Curr. Opinion Mol.Ther. 3: 239-243; Koshkin et al. (1998) Tetrahedron 54: 3607-3630; Obikaet al. (1998) Tetrahedron Lett. 39: 5401-5404.

“Homology” refers to the percent identity between two polynucleotide ortwo polypeptide molecules. Two nucleic acid, or two polypeptidesequences are “substantially homologous” to each other when thesequences exhibit at least about 50% sequence identity, preferably atleast about 75% sequence identity, more preferably at least about 80%85% sequence identity, more preferably at least about 90% sequenceidentity, and most preferably at least about 95% 98% sequence identityover a defined length of the molecules. As used herein, substantiallyhomologous also refers to sequences showing complete identity to thespecified sequence.

In general, “identity” refers to an exact nucleotide to nucleotide oramino acid to amino acid correspondence of two polynucleotides orpolypeptide sequences, respectively. Percent identity can be determinedby a direct comparison of the sequence information between two moleculesby aligning the sequences, counting the exact number of matches betweenthe two aligned sequences, dividing by the length of the shortersequence, and multiplying the result by 100. Readily available computerprograms can be used to aid in the analysis, such as ALIGN, Dayhoff, M.O. in Atlas of Protein Sequence and Structure M. O. Dayhoff ed., 5Suppl. 3:353 358, National biomedical Research Foundation, Washington,D.C., which adapts the local homology algorithm of Smith and WatermanAdvances in Appl. Math. 2:482 489, 1981 for peptide analysis. Programsfor determining nucleotide sequence identity are available in theWisconsin Sequence Analysis Package, Version 8 (available from GeneticsComputer Group, Madison, Wis.) for example, the BESTFIT, FASTA and GAPprograms, which also rely on the Smith and Waterman algorithm. Theseprograms are readily utilized with the default parameters recommended bythe manufacturer and described in the Wisconsin Sequence AnalysisPackage referred to above. For example, percent identity of a particularnucleotide sequence to a reference sequence can be determined using thehomology algorithm of Smith and Waterman with a default scoring tableand a gap penalty of six nucleotide positions.

Another method of establishing percent identity in the context of thepresent invention is to use the MPSRCH package of programs copyrightedby the University of Edinburgh, developed by John F. Collins and ShaneS. Sturrok, and distributed by IntelliGenetics, Inc. (Mountain View,Calif.). From this suite of packages, the Smith Waterman algorithm canbe employed where default parameters are used for the scoring table (forexample, gap open penalty of 12, gap extension penalty of one, and a gapof six). From the data generated the “Match” value reflects “sequenceidentity.” Other suitable programs for calculating the percent identityor similarity between sequences are generally known in the art, forexample, another alignment program is BLAST, used with defaultparameters. For example, BLASTN and BLASTP can be used using thefollowing default parameters: genetic code=standard; filter=none;strand=both; cutoff=60; expect=10; Matrix=BLOSUM62; Descriptions=50sequences; sort by=HIGH SCORE; Databases=non-redundant,GenBank+EMBL+DDBJ+PDB+GenBank CDS translations+Swissprotein+Spupdate+PIR. Details of these programs are readily available.

Alternatively, homology can be determined by hybridization ofpolynucleotides under conditions which form stable duplexes betweenhomologous regions, followed by digestion with single stranded specificnuclease(s), and size determination of the digested fragments. DNAsequences that are substantially homologous can be identified in aSouthern hybridization experiment under, for example, stringentconditions, as defined for that particular system. Defining appropriatehybridization conditions is within the skill of the art. See, e.g.,Sambrook et al., supra; DNA Cloning, supra; Nucleic Acid Hybridization,supra.

“Recombinant” as used herein to describe a nucleic acid molecule means apolynucleotide of genomic, cDNA, viral, semisynthetic, or syntheticorigin which, by virtue of its origin or manipulation, is not associatedwith all or a portion of the polynucleotide with which it is associatedin nature. The term “recombinant” as used with respect to a protein orpolypeptide means a polypeptide produced by expression of a recombinantpolynucleotide. In general, the gene of interest is cloned and thenexpressed in transformed organisms, as described further below. The hostorganism expresses the foreign gene to produce the protein underexpression conditions.

The term “transformation” refers to the insertion of an exogenouspolynucleotide into a host cell, irrespective of the method used for theinsertion. For example, direct uptake, transduction or f-mating areincluded. The exogenous polynucleotide may be maintained as anon-integrated vector, for example, a plasmid, or alternatively, may beintegrated into the host genome.

“Recombinant host cells”, “host cells,” “cells”, “cell lines,” “cellcultures,” and other such terms denoting microorganisms or highereukaryotic cell lines cultured as unicellular entities refer to cellswhich can be, or have been, used as recipients for recombinant vector orother transferred DNA, and include the original progeny of the originalcell which has been transfected.

A “coding sequence” or a sequence which “encodes” a selectedpolypeptide, is a nucleic acid molecule which is transcribed (in thecase of DNA) and translated (in the case of mRNA) into a polypeptide invivo when placed under the control of appropriate regulatory sequences(or “control elements”). The boundaries of the coding sequence can bedetermined by a start codon at the 5′ (amino) terminus and a translationstop codon at the 3′ (carboxy) terminus. A coding sequence can include,but is not limited to, cDNA from viral, prokaryotic or eukaryotic mRNA,genomic DNA sequences from viral or prokaryotic DNA, and even syntheticDNA sequences. A transcription termination sequence may be located 3′ tothe coding sequence. Typical “control elements,” include, but are notlimited to, transcription promoters, transcription enhancer elements,transcription termination signals, polyadenylation sequences (located 3′to the translation stop codon), sequences for optimization of initiationof translation (located 5′ to the coding sequence), and translationtermination sequences.

“Operably linked” refers to an arrangement of elements wherein thecomponents so described are configured so as to perform their usualfunction. Thus, a given promoter operably linked to a coding sequence iscapable of effecting the expression of the coding sequence when theproper enzymes are present. The promoter need not be contiguous with thecoding sequence, so long as it functions to direct the expressionthereof. Thus, for example, intervening untranslated yet transcribedsequences can be present between the promoter sequence and the codingsequence and the promoter sequence can still be considered “operablylinked” to the coding sequence.

“Expression cassette” or “expression construct” refers to an assemblywhich is capable of directing the expression of the sequence(s) orgene(s) of interest. An expression cassette generally includes controlelements, as described above, such as a promoter which is operablylinked to (so as to direct transcription of) the sequence(s) or gene(s)of interest, and often includes a polyadenylation sequence as well.Within certain embodiments of the invention, the expression cassettedescribed herein may be contained within a donor polynucleotide,plasmid, or viral vector construct. In addition to the components of theexpression cassette, the construct may also include, one or moreselectable markers, a signal which allows the construct to exist assingle stranded DNA (e.g., a M13 origin of replication), at least onemultiple cloning site, and a “mammalian” origin of replication (e.g., aSV40 or adenovirus origin of replication).

“Purified polynucleotide” refers to a polynucleotide of interest orfragment thereof which is essentially free, e.g., contains less thanabout 50%, preferably less than about 70%, and more preferably less thanabout at least 90%, of the protein with which the polynucleotide isnaturally associated. Techniques for purifying polynucleotides ofinterest are well-known in the art and include, for example, disruptionof the cell containing the polynucleotide with a chaotropic agent andseparation of the polynucleotide(s) and proteins by ion-exchangechromatography, affinity chromatography and sedimentation according todensity.

The term “transfection” is used to refer to the uptake of foreign DNA bya cell. A cell has been “transfected” when exogenous DNA has beenintroduced inside the cell membrane. A number of transfection techniquesare generally known in the art. See, e.g., Graham et al. (1973)Virology, 52:456, Sambrook et al. (2001) Molecular Cloning, a laboratorymanual, 3rd edition, Cold Spring Harbor Laboratories, New York, Davis etal. (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill,and Chu et al. (1981) Gene 13:197. Such techniques can be used tointroduce one or more exogenous DNA moieties into suitable host cells.The term refers to both stable and transient uptake of the geneticmaterial, and includes uptake of peptide- or antibody-linked DNAs.

A “vector” is capable of transferring nucleic acid sequences to targetcells (e.g., viral vectors, non-viral vectors, particulate carriers, andliposomes). Typically, “vector construct,” “expression vector,” and“gene transfer vector,” mean any nucleic acid construct capable ofdirecting the expression of a nucleic acid of interest and which cantransfer nucleic acid sequences to target cells. Thus, the term includescloning and expression vehicles, as well as plasmid and viral vectors.

The terms “variant,” “analog” and “mutein” refer to biologically activederivatives of the reference molecule that retain desired activity, suchas site-directed BcsG phosphoethanolamine transferase activity. Ingeneral, the terms “variant” and “analog” refer to compounds having anative polypeptide sequence and structure with one or more amino acidadditions, substitutions (generally conservative in nature) and/ordeletions, relative to the native molecule, so long as the modificationsdo not destroy biological activity and which are “substantiallyhomologous” to the reference molecule as defined below. In general, theamino acid sequences of such analogs will have a high degree of sequencehomology to the reference sequence, e.g., amino acid sequence homologyof more than 50%, generally more than 60%-70%, even more particularly80%-85% or more, such as at least 90%-95% or more, when the twosequences are aligned. Often, the analogs will include the same numberof amino acids but will include substitutions, as explained herein. Theterm “mutein” further includes polypeptides having one or more aminoacid-like molecules including but not limited to compounds comprisingonly amino and/or imino molecules, polypeptides containing one or moreanalogs of an amino acid (including, for example, unnatural amino acids,etc.), polypeptides with substituted linkages, as well as othermodifications known in the art, both naturally occurring andnon-naturally occurring (e.g., synthetic), cyclized, branched moleculesand the like. The term also includes molecules comprising one or moreN-substituted glycine residues (a “peptoid”) and other synthetic aminoacids or peptides. (See, e.g., U.S. Pat. Nos. 5,831,005; 5,877,278; andU.S. Pat. No. 5,977,301; Nguyen et al., Chem. Biol. (2000) 7:463-473;and Simon et al., Proc. Natl. Acad. Sci. USA (1992) 89:9367-9371 fordescriptions of peptoids). Methods for making polypeptide analogs andmuteins are known in the art and are described further below.

As explained above, analogs generally include substitutions that areconservative in nature, i.e., those substitutions that take place withina family of amino acids that are related in their side chains.Specifically, amino acids are generally divided into four families: (1)acidic—aspartate and glutamate; (2) basic—lysine, arginine, histidine;(3) non-polar—alanine, valine, leucine, isoleucine, proline,phenylalanine, methionine, tryptophan; and (4) uncharged polar—glycine,asparagine, glutamine, cysteine, serine threonine, and tyrosine.Phenylalanine, tryptophan, and tyrosine are sometimes classified asaromatic amino acids. For example, it is reasonably predictable that anisolated replacement of leucine with isoleucine or valine, an aspartatewith a glutamate, a threonine with a serine, or a similar conservativereplacement of an amino acid with a structurally related amino acid,will not have a major effect on the biological activity. For example,the polypeptide of interest may include up to about 5-10 conservative ornon-conservative amino acid substitutions, or even up to about 15-25conservative or non-conservative amino acid substitutions, or anyinteger between 5-25, so long as the desired function of the moleculeremains intact. One of skill in the art may readily determine regions ofthe molecule of interest that can tolerate change by reference toHopp/Woods and Kyte-Doolittle plots, well known in the art.

“Gene transfer” or “gene delivery” refers to methods or systems forreliably inserting DNA or RNA of interest into a host cell. Such methodscan result in transient expression of non-integrated transferred DNA,extrachromosomal replication and expression of transferred replicons(e.g., episomes), or integration of transferred genetic material intothe genomic DNA of host cells. Gene delivery expression vectors include,but are not limited to, vectors derived from bacterial plasmid vectors,viral vectors, non-viral vectors, adenoviruses, retroviruses,alphaviruses, pox viruses, and vaccinia viruses.

The term “derived from” is used herein to identify the original sourceof a molecule but is not meant to limit the method by which the moleculeis made which can be, for example, by chemical synthesis or recombinantmeans.

A polynucleotide “derived from” a designated sequence refers to apolynucleotide sequence which comprises a contiguous sequence ofapproximately at least about 6 nucleotides, preferably at least about 8nucleotides, more preferably at least about 10-12 nucleotides, and evenmore preferably at least about 15-20 nucleotides corresponding, i.e.,identical or complementary to, a region of the designated nucleotidesequence. The derived polynucleotide will not necessarily be derivedphysically from the nucleotide sequence of interest, but may begenerated in any manner, including, but not limited to, chemicalsynthesis, replication, reverse transcription or transcription, which isbased on the information provided by the sequence of bases in theregion(s) from which the polynucleotide is derived. As such, it mayrepresent either a sense or an antisense orientation of the originalpolynucleotide.

II. Modes of Carrying Out the Invention

Before describing the present invention in detail, it is to beunderstood that this invention is not limited to particular formulationsor process parameters as such may, of course, vary. It is also to beunderstood that the terminology used herein is for the purpose ofdescribing particular embodiments of the invention only, and is notintended to be limiting.

Although a number of methods and materials similar or equivalent tothose described herein can be used in the practice of the presentinvention, the preferred materials and methods are described herein.

The present invention is based on the discovery of phosphoethanolaminecellulose and the genetic and molecular basis for its production inbacteria. The bcsEFG operon is part of a cellulose gene clusterimplicated in cellulose biosynthesis in E. coli and other Gram-negativebacteria. In particular, the inventors have shown that thephosphoethanolamine modification of cellulose depends on aphosphoethanolamine transferase encoded by the BcsG gene. BcsE and BcsFplay accessory and possibly regulatory roles, their presence increasingthe extent of cellulose modification and the amount of modifiedcellulose produced (see Examples). The invention further relates to theproduction, derivatization, and methods of using phosphoethanolaminecellulose.

A. Production of Phosphoethanolamine Cellulose

Phosphoethanolamine cellulose can be prepared in any suitable manner(e.g., biosynthetically, purification from cell culture, or chemicalsynthesis, etc.). In one embodiment, phosphoethanolamine cellulose isproduced biosynthetically by expression of BcsG phosphoethanolaminetransferase in a cellulose-producing host, wherein the expressed BcsGphosphoethanolamine transferase catalyzes phosphoethanolamine transferto hydroxyl groups on cellulose produced by the host. In someembodiments, BcsG is coexpressed with BcsE and BcsF to increase theextent of cellulose modification and/or amount of cellulose productionin the host. The amount of phosphoethanolamine cellulose produced may befurther increased by providing cyclic di-GMP to upregulate celluloseproduction in the host. Phosphoethanolamine cellulose, produced by themethods described herein, can be recovered from host cells and furtherpurified if desired. Phosphoethanolamine cellulose is preferablyprepared in substantially pure form (i.e. substantially free from otherhost cell components and other contaminants).

Suitable hosts for production of phosphoethanolamine cellulose includebacteria, plants, and algae, or any other type of organism or cellcapable of producing cellulose that can be modified by the BcsGphosphoethanolamine transferase. In some embodiments,phosphoethanolamine cellulose is produced biosynthetically byGram-negative bacteria, such as, but not limited to, bacteria of theAcetobacter (e.g., Acetobacter xylinum), Agrobacterium, Escherichia(e.g., Escherichia coli), or Salmonella (e.g., Salmonella enterica)genuses.

Any BcsG phosphoethanolamine transferase from any bacterial species, ora biologically active fragment, variant, analog, or derivative thereofthat retains BcsG phosphoethanolamine transferase activity (i.e.,catalyzes transfer of a phosphoethanolamine group to a cellulosehydroxyl group) may be used to produce a phosphoethanolamine-modifiedcellulose. The BcsG phosphoethanolamine transferase need not bephysically derived from an organism, but may be synthetically orrecombinantly produced. Representative sequences are presented for BcsGfrom Escherichia coli (SEQ ID NO:12) and Salmonella enterica (SEQ IDNO:13), and additional representative sequences are listed in theNational Center for Biotechnology Information (NCBI) database. See, forexample, NCBI entries: NP 417995, NC 000913, YP_002414689, WP_049093031,WP_049185400, WP_049125481, WP_049016646, WP_032425312, WP_057517249,WP_054376814, WP_050272434, WP_050967202, WP_050950257, WP_020978599,WP_020937258, WP_000192030, WP_001541082, WP_088744589, WP_085416812,WP_085347572, WP_052994086, WP_052992671, WP_052992608, WP_052982079,WP_052973396, YP_001008219, YP_206845, NP 744776, WP_048206881,WP_023291299, WP_060082415, WP_049591480, WP_049325226, WP_049300323,WP_049267920, and WP_049217448; all of which sequences (as entered bythe date of filing of this application) are herein incorporated byreference. Any of these sequences or a variant thereof comprising asequence having at least about 70-100% sequence identity thereto,including any percent identity within this range, such as 70, 71, 72,73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can beused for cellulose modification, as described herein, wherein thevariant retains biological activity, such as BcsG phosphoethanolaminetransferase activity.

The BcsG phosphoethanolamine transferase, alone or in combination withthe BcsE and BcsF-encoded proteins, can be used for modification ofcellulose produced by a host. Nucleic acids comprising the BcsG, BcsE,or BcsF genes can be inserted into an expression vector to create anexpression cassette capable of producing the encoded proteins in asuitable host cell. Numerous vectors are known in the art including, butnot limited to, linear polynucleotides, polynucleotides associated withionic or amphiphilic compounds, plasmids, and viruses. Thus, the term“vector” includes an autonomously replicating plasmid or a virus. Forpurposes of this application, the terms “expression construct,”“expression vector,” and “vector,” are used interchangeably todemonstrate the application of the invention in a general, illustrativesense, and are not intended to limit the invention. The BcsG, BcsE, andBcsF genes may be provided by a single vector or separate vectors. Inone embodiment, the vector comprises a bcsEFG operon.

In certain embodiments, the nucleic acid encoding a polypeptide ofinterest (e.g., BcsG, BcsE, or BcsF-encoded polypeptide) is undertranscriptional control of a promoter. A “promoter” refers to a DNAsequence recognized by the synthetic machinery of the cell, orintroduced synthetic machinery, required to initiate the specifictranscription of a gene. The term promoter will be used here to refer toa group of transcriptional control modules that are clustered around theinitiation site for a bacterial RNA polymerase or eukaryotic RNApolymerase (e.g., RNA polymerase I, II, or III). Typical promoters forbacterial expression include the Tac, RecA, LacZ, pBAD, OXB1-20, OXB1,ctc, gsiB, Pspv, and T7 promoters (see, e.g., Goldstein et al. (1995)Biotechnol. Annu. Rev. 1:105-128). Examples of promoters for expressionin plants include the CaMV 35S, Xa27, FMV, opine promoters, plantubiquitin promoter (Ubi), rice actin 1 promoter (Act-1), maize alcoholdehydrogenase 1 promoter (Adh-1), and various other plant pathogen,synthetic, and native promoters (see, e.g., Liu et al. (2016) Curr.Opin. Biotechnol. 37:36-44, Dey et al. (2015) Planta 242(5):1077-1094,Jeong et al. (2015) J. Integr. Plant Biol. 57(11):913-924,Hernandez-Garcia et al. (2014) Plant Sci. 217-218:109-119). These andother promoters can be obtained from commercially available vectors,using techniques well known in the art. See, e.g., Sambrook et al.,supra. Enhancer elements may be used in association with a promoter toincrease expression levels of the constructs.

An expression vector for expressing BcsG, BcsE, or BcsF comprises apromoter “operably linked” to a polynucleotide comprising a BcsG, BcsE,or BcsF gene sequence. The phrase “operably linked” or “undertranscriptional control” as used herein means that the promoter is inthe correct location and orientation in relation to a polynucleotide tocontrol the initiation of transcription by RNA polymerase and expressionof the polynucleotide.

Typically, transcription terminator/polyadenylation signals may also bepresent in the expression construct. Bacterial terminator sequences mayinclude Rho-independent or Rho-dependent transcription terminatorsequences. Examples of eukaryotic terminator sequences include, but arenot limited to, those derived from SV40, as described in Sambrook etal., supra, bovine growth hormone terminator sequence (see, e.g., U.S.Pat. No. 5,122,458), and plant terminator sequences such as theAgrobacterium nopaline synthase (NOS) terminator (see, e.g.,International Patent Application Publication No. WO 2013/012729, Chunget al. (2005) Trends Plant Sci. 10(8):357-361). Additionally, 5′-UTRsequences can be placed adjacent to the coding sequence in order toenhance expression of the same. Such sequences may include UTRscomprising an internal ribosome entry site (IRES). Inclusion of an IRESpermits the translation of one or more open reading frames from avector. The IRES element attracts a eukaryotic ribosomal translationinitiation complex and promotes translation initiation. See, e.g.,Kaufman et al., Nuc. Acids Res. (1991) 19:4485-4490; Gurtu et al.,Biochem. Biophys. Res. Comm. (1996) 229:295-298; Rees et al.,BioTechniques (1996) 20:102-110; Kobayashi et al., BioTechniques (1996)21:399-402; and Mosser et al., BioTechniques (1997 22 150-161.

A multitude of IRES sequences are known and include sequences derivedfrom a wide variety of viruses, such as from leader sequences ofpicornaviruses such as the encephalomyocarditis virus (EMCV) UTR (fanget al. J. Virol. (1989) 63:1651-1660), the polio leader sequence, thehepatitis A virus leader, the hepatitis C virus IRES, human rhinovirustype 2 IRES (Dobrikova et al., Proc. Natl. Acad. Sci. (2003)100(25):15125-15130), an IRES element from the foot and mouth diseasevirus (Ramesh et al., Nucl. Acid Res. (1996) 24:2697-2700), agiardiavirus IRES (Garlapati et al., J. Biol. Chem. (2004)279(5):3389-3397), and the like. A variety of nonviral IRES sequenceswill also find use herein, including, but not limited to IRES sequencesfrom yeast, as well as the human angiotensin II type 1 receptor IRES(Martin et al., Mol. Cell Endocrinol. (2003) 212:51-61), fibroblastgrowth factor IRESs (FGF-1 IRES and FGF-2 IRES, Martineau et al. (2004)Mol. Cell. Biol. 24(17):7622-7635), vascular endothelial growth factorIRES (Baranick et al. (2008) Proc. Natl. Acad. Sci. U.S.A.105(12):4733-4738, Stein et al. (1998) Mol. Cell. Biol. 18(6):3112-3119,Bert et al. (2006) RNA 12(6):1074-1083), and insulin-like growth factor2 IRES (Pedersen et al. (2002) Biochem. J. 363(Pt 1):37-44). Theseelements are readily commercially available in plasmids sold, e.g., byClontech (Mountain View, Calif.), Invivogen (San Diego, Calif.), Addgene(Cambridge, Mass.) and GeneCopoeia (Rockville, Md.). See also IRESite:The database of experimentally verified IRES structures (iresite.org).An IRES sequence may be included in a vector, for example, to expressBcsG in combination with BcsE and BcsF from an expression cassette.

Alternatively, a polynucleotide encoding a viral T2A peptide can be usedto allow production of multiple protein products (e.g., BcsG, incombination with BcsE and BcsF) from a single vector. 2A linker peptidesare inserted between the coding sequences in the multicistronicconstruct. The 2A peptide, which is self-cleaving, allows co-expressedproteins from the multicistronic construct to be produced at equimolarlevels. 2A peptides from various viruses may be used, including, but notlimited to 2A peptides derived from the foot-and-mouth disease virus,equine rhinitis A virus, Thosea asigna virus and porcine teschovirus-1.See, e.g., Kim et al. (2011) PLoS One 6(4):e18556, Trichas et al. (2008)BMC Biol. 6:40, Provost et al. (2007) Genesis 45(10):625-629, Furler etal. (2001) Gene Ther. 8(11):864-873; herein incorporated by reference intheir entireties.

One of skill in the art can readily determine BcsG, BcsE, and BcsFnucleotide sequences using standard methodology and the teachingsherein. Oligonucleotide probes can be devised based on the knownsequences and used to probe genomic or cDNA libraries. The sequences canthen be further isolated using standard techniques and, e.g.,restriction enzymes employed to truncate the gene at desired portions ofthe full-length sequence. Similarly, sequences of interest can beisolated directly from cells containing the same, using knowntechniques, such as phenol extraction and the sequence furthermanipulated to produce the desired truncations. See, e.g., Sambrook etal., supra, for a description of techniques used to obtain and isolateDNA.

The BcsG, BcsE, and BcsF sequences can also be produced synthetically,for example, based on their known sequences. The nucleotide sequence canbe designed with the appropriate codons for the particular amino acidsequence desired. The complete sequence is generally assembled fromoverlapping oligonucleotides prepared by standard methods and assembledinto a complete coding sequence. See, e.g., Edge (1981) Nature 292:756;Nambair et al. (1984) Science 223:1299; Jay et al. (1984) J. Biol. Chem.259:6311; Stemmer et al. (1995) Gene 164:49-53.

Once coding sequences have been isolated and/or synthesized, they can becloned into any suitable vector or replicon for expression. Numerousexpression vectors are known to those of skill in the art, and theselection of an appropriate expression vector is a matter of choice. Forexample, a bacterial plasmid expression vector may be used to transforma bacterial host. Bacterial expression vectors include, but are notlimited to, pACYC177, pASK75, pBAD, pBADM, pBAT, pCal, pET, pETM, pGAT,pGEX, pHAT, pKK223, pMal, pProEx, pQE, and pZA31 vectors. See, e.g.,Sambrook et al., supra.

Alternatively, plant expression systems can also be used to producemodified cellulose as described herein. Generally, such systems usevirus-based vectors to transfect plant cells with heterologous genes.Exemplary plant viruses include the tobacco mosaic virus (TMV), potatovirus X, and cowpea mosaic virus. A number of plant expression systemsuse the Ti plasmid of Agrobacterium tumefaciens. For a description ofplant expression systems, see, e.g., Zaidi et al. (2017) Front. PlantSci. 8:539; Hefferon (2014) Biomed. Res. Int. 2014:785382; Porta et al.(1996) Mol. Biotech. 5:209-221; and Hackland et al. (1994) Arch. Virol.139:1-22. In addition, algae expression systems are available forChlamydomonas reinhardtii and Synechococcus elongatus. See, e.g., Doronet al. (2016) Front. Plant Sci. 7:505 and Griesbeck et al. (2006) Mol.Biotechnol. 34(2):213-223.

A gene can be placed under the control of a promoter, ribosome bindingsite (for bacterial expression) and, optionally, an operator(collectively referred to herein as “control” elements), so that the DNAsequence encoding the desired polypeptide is transcribed into RNA in thehost cell transformed by a vector containing this expressionconstruction. The coding sequence may or may not contain a signalpeptide or leader sequence. With the present invention, both thenaturally occurring signal peptides and heterologous sequences can beused. Leader sequences can be removed by the host in post-translationalprocessing. See, e.g., U.S. Pat. Nos. 4,431,739; 4,425,437; 4,338,397.Such sequences include, but are not limited to, the TPA leader, as wellas the honey bee mellitin signal sequence.

Other regulatory sequences may also be desirable which allow forregulation of expression of the protein sequences relative to the growthof the host cell. Such regulatory sequences are known to those of skillin the art, and examples include those which cause the expression of agene to be turned on or off in response to a chemical or physicalstimulus, including the presence of a regulatory compound. Other typesof regulatory elements may also be present in the vector, for example,enhancer sequences. The control sequences and other regulatory sequencesmay be ligated to the coding sequence prior to insertion into a vector.Alternatively, the coding sequence can be cloned directly into anexpression vector that already contains the control sequences and anappropriate restriction site.

In some cases, it may be necessary to modify the coding sequence so thatit may be attached to the control sequences with the appropriateorientation; i.e., to maintain the proper reading frame. Mutants oranalogs may be prepared by the deletion of a portion of the sequenceencoding the protein, by insertion of a sequence, and/or by substitutionof one or more nucleotides within the sequence. Techniques for modifyingnucleotide sequences, such as site-directed mutagenesis, are well knownto those skilled in the art. See, e.g., Sambrook et al., supra; DNACloning, Vols. I and II, supra; Nucleic Acid Hybridization, supra.

The expression vector is then used to transform an appropriatecellulose-producing host cell. Depending on the expression system andhost selected, the modified cellulose is produced by growing host cellstransformed by an expression vector described above under conditionswhereby the BcsG phosphoethanolamine transferase is expressed (i.e.,with or without the BcsE and BcsF-encoded proteins). The BcsGphosphoethanolamine transferase catalyzes phosphoethanolamine transferto hydroxyl groups of the cellulose produce by the host. The selectionof the appropriate growth conditions is within the skill of the art.

Phosphoethanolamine cellulose can be produced in bacteria, for example,by culturing bacteria in media containing a suitable carbon source.Exemplary carbon sources include monosaccharides (e.g., glucose andfructose), disaccharides (e.g., sucrose, maltose, and lactose),oligosaccharides, polysaccharides (e.g., starch hydrolysates), mannitol,ethanol, acetic acid, citric acid, glycerol, beet molasses (B-Mol), andbiodiesel fuel by-product (BDF-B). One or more carbon sources may beused. The choice of carbon sources will depend on the type of bacteriaused for production of cellulose, the culture conditions, the cost ofproduction, and the like. Media may be supplied manually orautomatically with a continuous, batch, or semi-batch fed culturesystem.

B. Applications

Phosphoethanolamine cellulose will find use in a wide variety ofindustrial, nutritional, electronic, scientific, and medicalapplications. For example, phosphoethanolamine cellulose may be used invarious applications in which other forms of cellulose are currentlyused, such as in production of paper, textile, biofuels, food,pharmaceutical fillers, cellulose composites for electronic devices,nanocellulosic materials, and liquid filtration and chromatographicmedia.

The phosphoethanolamine group enhances solubility of the cellulose andfacilitates the conversion of the polymer to shorter polysaccharides andoligosaccharides or monosaccharides such as glucose through physical,chemical (e.g., acid hydrolysis), or enzymatic (e.g., cellulasecatalyzed hydrolysis) methods. For example, phosphoethanolaminecellulose can be hydrolyzed by contacting the phosphoethanolaminecellulose with one or more cellulases. In certain embodiments, one ormore endocellulases, exocellulases, beta-glucosidases, oxidativecellulases, cellulose phosphorylases, or a combination thereof are usedin hydrolysis of phosphoethanolamine cellulose.

The enhanced conversion of phosphoethanolamine cellulose to glucoseprovides an attractive route for production of ethanol from cellulosicbiomass either in bacteria (e.g., Escherichia coli, Acetobacter xylinum,or other bacterial cellulose producer) or through bioengineering ofother organisms to express bcsG (e.g., either by itself or incombination with the bcsE and bcsF genes), for example, in Miscanthus orother plants or algae.

The phosphoethanolamine cellulose may also be further modified due toits reactive amine group functionality to generate a wide number ofother cellulosic materials. For example, the amine group is readilyalkylated, acylated, or sulfonated. In particular, the amine group canbe conjugated to various agents such as peptides, antibodies, enzymes,nucleic acids, dyes, ligands, or drugs. Methods for conjugating aminesare well known in the art. For example, conjugation may be performedwith amine-reactive succinimidyl esters or click chemistry. For adescription of various conjugation techniques, see, e.g., BioconjugationProtocols: Strategies and Methods (S. S. Mark ed., Humana Press, 2016),G. T. Hermanson Bioconjugate Techniques (Academic Press, 3rd edition,2013), Click Chemistry for Biotechnology and Materials Science (J.Lahann ed., Wiley, 2009); herein incorporated by reference in theirentireties.

In particular, phosphoethanolamine cellulose can be chemically modifiedto produce useful cellulose ester and ether derivatives. For example,cellulose ester derivatives can be formed by esterification of cellulosehydroxyl groups with an organic acid, acid anhydride, or acid chloride,or an inorganic acid. Exemplary organic acids that can be used inesterification include acetic acid, propanoic acid, and butyric acid.Alternatively, the corresponding acid anhydrides (e.g., aceticanhydride, propionic anhydride, and butyric anhydride) or acid chlorides(e.g., acetyl chloride) can be used. Exemplary inorganic acids that canbe used in esterification include nitric acid and sulfuric acid. For adescription of methods of synthesizing cellulose esters, see, e.g.,Edgar et al. (2001) Progress in Polymer Science 26:1605-1688, Cao et al.(2013) J. Agric. Food Chem. 61:2489-2495, Heinze et al. (2003) Cellulose10:283-296, Krassig (1993) Cellulose (Polymer Monographs) Volume 11, CRCPress, Liebert et al. (2005) Biomacromolecules 6:333-340, El-Sakhawy etal. (2014) J. Drug Deliv. 2014:575969, U.S. Pat. Nos. 9,624,311,9,458,248, 9,217,043, 9,708,415, 8,273,872, 6,184,373, 5,750,677,2,651,629, and 3,097,051; herein incorporated by reference. Celluloseesters are commonly used, for example, as binders, coating additives,and film formers or modifiers, and may find use in automotive, wood,plastic, paper, apparel, photography, and leather coatings applications.

Alternatively, the cellulose hydroxyl groups can be chemically modifiedto produce a cellulose ether, such as an alkyl ether (e.g.,methylcellulose, ethylcellulose), hydroxyalkyl ether (e.g.,hydroxyethylcellulose, hydroxylpropyl cellulose), or carboxyalkyl ether(e.g., carboxymethylcellulose). Cellulose ethers are commonly preparedusing an alkali metal hydroxide to deprotonate cellulose hydroxylgroups, which are reacted with an etherifying agent such as an alkylhalide, alkyl sulfate, alkylene oxide, or chlorohydrin. For adescription of methods of synthesizing cellulose ethers, see, e.g.,Kristin Schumann et al. (2009) Macromolecular Symposia 280:86-94,Goncalves et al. (2015) Carbohydrate Polymers 116:51-59, Lorand (1939)Ind. Eng. Chem. 31:891-897, U.S. Pat. Nos. 2,512,338, 8,541,571, and9,580,516; herein incorporated by reference. Cellulose ethers arecommonly used, for example, as thickeners, binders, film formers,water-retention agents, suspension aids, surfactants, lubricants, andprotective colloids and emulsifiers, and may find use in construction,ceramics, paints, foods, cosmetics, and pharmaceuticals.

In particular, cellulose esters and ethers may find use inpharmaceuticals for sustained and controlled release formulations,osmotic drug delivery systems, bioadhesives and mucoadhesives,compressibility enhancers in tablets, liquid dosage forms as thickeningagents and stabilizers, binders in tablets, semisolid preparations asgelling agents, and various other applications.

III. Experimental

Below are examples of specific embodiments for carrying out the presentinvention. The examples are offered for illustrative purposes only, andare not intended to limit the scope of the present invention in any way.

Efforts have been made to ensure accuracy with respect to numbers used(e.g., amounts, temperatures, etc.), but some experimental error anddeviation should, of course, be allowed for.

Example 1 Phosphoethanolamine Cellulose: A Naturally Produced ChemicallyModified Cellulose

Introduction

Here, we report on the determination of the structure of a modifiedcellulose, phosphoethanolamine cellulose (pEtN cellulose), producednaturally by E. coli and other Gram-negative bacteria. We provide thegenetic basis for its production and the functional implications ofgene-directed pEtN cellulose synthesis.

E. coli and Salmonella are among the best-studied microorganismsreported to produce cellulose. These include human pathogens such asuropathogenic and enterohemorrhagic E. coli. Functionally, theexopolysaccharide cellulose is a major component of the self-producedextracellular matrix in biofilms, which represent physiologicallyheterogeneous and spatially structured bacterial communities (Stewart etal. (2008) Nature Rev. Microbiol. 6, 199-210; Serra et al. (2013) mBio4(2), e00103-00113). Biofilm formation is of high medical relevance asit confers enhanced resistance to antibiotics and host defenses duringinfection (Anderson et al. (2008) Curr. Top. Microbiol. Immunol. 322,87-107). Within the biofilm matrix, cellulose forms a nanocomposite withamyloid curli fibers that encapsulates individual cells insupramolecular basket-like structures, enmeshes the bacterial communityand confers cohesion and elasticity that allows biofilms to fold andbuckle up in a tissue-like manner (Hung et al. (2013) mBio 4,e00645-00613; McCrate et al. (2013) J. Mol. Biol. 425, 4286-4294; Serraet al. (2013) J. Bacteriol. 195, 5540-5554). Biochemical and solid-stateNMR measurements with the clinically important uropathogenic E. colistrain UTI89 established that the matrix was composed of curli fibersand cellulosic material in a 6:1 ratio by mass. During this bottom-upanalysis involving ¹³C and ¹⁵N NMR analysis of the purified components,we also discovered that the cellulose portion appeared to be modified insome way with an aminoethyl functionality (McCrate et al., supra).

Solid-state NMR analysis of the intact cellulosic material, complementedby solution-state NMR analysis of acid-digested material, has nowenabled the determination of the chemical structure of the modifiedcellulose as a polymer containing glucose andglucose-6-phosphoethanolamine (FIG. 1A). The ¹³C CPMAS spectrum ofcellulose isolated from the laboratory strain of E. coli, AR3110ΔcsgB,which lacks amyloid curli fibers, contains carbon contributions from theglucose backbone plus two additional carbons at 41 ppm and 63 ppm (FIG.1B). The C6 carbon appears at 62 ppm for the unmodified glucose unitsand at 66 ppm for the modified glucose units. The isolation andpurification of the cellulosic polymer from E. coli was aided by the useof the classic dye Congo red (Reichhardt et al. (2016) Anal. Bioanal.Chem. 408, 7709-7717). Congo red (CR) is commonly used as a supplementin nutrient agar plates for evaluation of E. coli and Salmonellacommunity phenotypes since both curli and cellulosic polymers bind thedye while generating the hallmark colony wrinkling exhibited bybiofilm-producing Enterobacteriaceae (Romling (2005) Cell. Mol. LifeSci. 62, 1234-1246). The use of the dye did not alter the modificationof cellulose, and the ¹³C centerband peaks associated with CR areresolved from cellulose (FIG. 1B). The CR-containing ¹³C CPMAS spectrumfor the modified cellulose from AR3110ΔcsgB is identical to that fromthe UTI89 curli mutant, UTI89ΔcsgA (FIG. 3).

C{N}REDOR assigned the 41-ppm carbon peak as the C8 carbon, the onlycarbon that is directly bonded to nitrogen (FIG. 4). ³¹P CPMAS (FIG. 5)and C{P}REDOR (FIG. 1C) confirmed the presence of ³¹P in the polymer andindicated that the full modification is a phosphoethanolamine extendingfrom the C6 carbon, wherein phosphorus is nearest to the C6 and C7carbons and next closest to the C5 and C8 carbons (FIG. 1B).Solution-state NMR analysis of the intact cellulose would ordinarily notbe expected to reveal resolved carbons given the insolubility of thematerial. Yet, the sufficient mobility of the E. coli modified celluloseenabled the detection of ¹H resonances with splitting patternsresembling those of —OCH2— and —CH2N— in phosphoethanolamine (FIG. 6).

Additional solution-state NMR analysis was performed on componentsreleased into solution after acid hydrolysis and supported theassignments from solid-state NMR. However, acid hydrolysis lead todegradation of the modification as well as precipitation of some of thematerial that was subsequently not assayed, observations that explainthe difficulty of detecting the modified cellulose using conventionalmethods. Nevertheless, the two distinct modification carbons (C7 and C8)were observed in the solution NMR spectrum of acid digested material(FIG. 7). The C5 and C6 ¹³C chemical shifts in the modified glucoseshifted upfield and downfield, respectively, as expected for amodification at C6 (FIG. 8). The complete set of ¹³C and ¹H chemicalshifts revealed the presence of soluble glucose, glucose-6-phosphate,and ethanolamine after hydrolysis and are provided in comparison withstandard samples similarly referenced in acid (FIGS. 7 and 8).Solution-state ¹H COSY and ¹H-¹³C HSQC spectra of the acid-hydrolyzedmaterial additionally supported the assignments (FIGS. 9 and 10).Finally, a solid-state ¹³C CP array NMR experiment confirmed thatapproximately one half of the cellulose glucose units in the intactpolymer are modified with pEtN on the glucose C6 carbon (FIG. 11).

A biosynthetically modified cellulose has wide-ranging implications andpotential applications. Among these, the specifically modified cellulosecould be essential for the formation and function of bacterial biofilmscontaining the polymer; could exhibit attractive properties for newcellulosic materials; and potentially could be introduced into otherorganisms if gene-directed. Thus, we sought to identify the genesinvolved in the installation of the cellulose modification. The bcsEFGoperon, which is part of the cellulose gene cluster in E. coli, had notbeen ascribed a definitive role in cellulose synthesis. The ¹³C CPMASNMR comparison of the isolated cellulose from a ΔbcsG mutant revealedthat the bcsG gene was indispensable for the cellulose modification(FIG. 1D). The spectrum lacks the contributions from the 41-ppm C8carbon and the 63-ppm C7 carbon. As expected, the sugar C6 carboncontribution appears only at the upfield ¹³C position of 63 ppm,corresponding only to unmodified glucose. Complementation of the ΔbcsGmutant with bcsG on a plasmid restored production of the modifiedcellulose (FIG. 12). The prevalence of the modification was reduced inin-frame non-polar ΔbcsE and ΔbcsF mutants, indicating that BcsE andBcsF may play an accessory and possibly regulatory role in theinstallation of pEtN by BcsG (FIG. 13).

Biofilm production by wild-type E. coli involves the coproduction andtight association of amyloid curli fibers and what has been consideredto be cellulose (Serra et al. (2013) mBio 4(2), e00103-00113; Serra etal. (2013) J. Bacteriol. 195, 5540-5554). Yet, we have now determinedthat the widely studied E. coli strains UTI89 and AR3110 produce pEtNcellulose. Thus, we sought to test whether this cellulose modificationis functionally important for matrix assembly by evaluation ofmacrocolony morphotypes. BcsG is required for the wrinkling typicallyobserved for the AR3110 macrocolonies. BcsG is also required for theformation of a pellicle, a biofilm formed at the air-liquid interface.Thus, pEtN modification of cellulose is required for community behaviorexhibited when both curli and pEtN cellulose are co-produced.

BcsG is composed of 559 amino acids and has been predicted to be anintegral membrane protein. A hydropathy plot analysis of BcsG supportedthe presence of several putative transmembrane spanning regions in theN-terminal 160 amino acids followed by a large hydrophilic C-terminaldomain.

To date, we have developed a model in which BcsG acts as aphosphoethanolamine transferase, modifying cellulose after its emergencefrom the BcsA-BcsB machinery. The stoichiometry of the modification asoccurring on approximately one half of the glucose units suggests thatthere is recognition of a disaccharide to result in modification of oneglucose unit. A higher stoichiometry is possible.

We also addressed the question of the substrate for BcsG-mediated PEtNmodification of cellulose emerging from the BcsA-BcsB complex. Wenoticed that, with respect of overall size (559 and 563 residues,respectively) and length and transmembrane orientation of domains, BcsGressembles EptB, a phosphoethanolamine transferase using thephospholipid phosphatidylethanolamine (PE) to modify bacteriallipopolysaccharide (LPS) (Reynolds et al. (2005) J. Biol. Chem. 280,21202-21211). We therefore hypothesized that pEtN modification by BcsGmay also originate from PE. In this case the modified cellulose shouldhave atoms derived from serine, which serves as a direct precursor forthe ethanolamine moiety of PE. Thus, pEtN cellulose was prepared fromcells grown on agar medium supplemented with 25 mg/L L-[3-¹³C]Ser todetect whether pEtN cellulose would be enriched through incorporation ofthe serine label. The expected C7 carbon in the pEtN cellulose spectrumwas indeed enhanced due to label incorporation from serine (FIG. 2A).The routing of serine into the modification is consistent with PEserving as a substrate for BcsG. Inspired by labeled serineincorporation, we sought to employ an isotopic labeling strategy thatcould identify whether pEtN cellulose was present in an intactextracellular matrix preparation with curli present in addition to thepolysaccharide. The L-[3-¹³C]Ser would not provide a unique signaturefor the modified cellulose as serine contributes significantly to themajor curli subunit protein, CsgA. Thus, we employed L-[¹⁵N]Serlabeling, anticipating an amine ¹⁵N contribution from pEtN celluloseresolved from curli amide signals. Inspection of the ¹⁵N CPMAS spectraof ECM isolated from AR3110 and AR3110AbcsG ECM revealed that the aminenitrogen resulting from L-[¹⁵N]Ser labeling was due to the pEtNcellulose and was not observed in the AR3110ΔbcsG spectrum (FIG. 2B).The amide peaks associated with curli serine residues and also glycinethrough isotopic scrambling are present in both spectra. In this way,the potential presence of pEtN cellulose can be determined in intact ECMpreparations from different E. coli strains and different organisms.

Finally, together with core cellulose genes, bcsEFG genes occur in manyγ- and β-proteobacteria (Romling et al. (2015) Trends Microbiol. 23,545-557). We isolated the cellulose material from Salmonella entericaserovar Typhimurium strain IR715ΔcsgBA (Tukel et al. (2005) Mol.Microbiol. 58, 289-304), a curli mutant, and discovered that it alsoproduces pEtN cellulose. The ¹³C CPMAS NMR spectrum of isolatedcellulose from Salmonella matches that of pEtN cellulose from E. coli(FIG. 2E). Thus, the pEtN modification of cellulose is likely to becommon in the γ and β branches of proteobacteria. Bacterial species thatdo not possess bcsEFG genes but feature other accessory bcs genes ofunknown function, could possibly use alternative modes of cellulosemodification.

Modified cellulose is produced by strains that have been assumed in theliterature to be producing standard amorphous cellulose based on simpleCalcofluor staining procedures and conventional isolation methodsdesigned for the detection of glucose from hydrolyzed cellulose.

However, these methods involve harsh hydrolysis protocols and crudepurification or enrichment methods, followed by chromatography and massspectrometry, and have not attempted a complete accounting of the intactmaterial. Solid-state NMR analysis of the relevant intact polysaccharidewas able to identify this biologically important pEtN modification thatevaded detection by conventional approaches. PEtN cellulose is a newlyidentified zwitterionic polymer and, to our knowledge, our studyprovides the first definitive evidence so far of a naturallypost-synthetically modified cellulose. In the extracellular matrix ofbacterial biofilms, pEtN modification of cellulose, which allowsenhanced network formation when coproduced with amyloid curli fibers.Thus, inhibition of BcsG could offer new opportunities to controlbiofilm formation, in particular by Gram-negative pathogens associatedwith infections and particularly serious and chronic infections.Furthermore, the identification of the gene-directed biosyntheticmachinery also inspires the generation of engineered systems to producealternately modified cellulose materials.

Materials and Methods

Bacterial Strains and Culture Conditions

The strains used in this study include laboratory E. coli K-12 strainsW3110 and AR3110, Uropathogenic E. coli strain UTI89 (O18:K1:H7), andSalmonella enterica serovar Typhimurium strain IR715ΔcsgBA (Hayashi etal. (2006) Mol. Syst. Biol. 2, 2006.0007). E. coli K-12 strain AR3110 isa direct derivative of W3110 (Hayashi et al., supra), in which codon 6(the stop codon TAG) in the chromosomal copy of bcsQ was changed to thesense codon TTG (Serra et al. (2013) J. Bacteriol. 195, 5540-5554).Knockout mutations generated in AR3110 or W3110 are full open readingframe deletion/antibiotic resistance cassette insertions previouslydescribed (Serra et al. (2013) J. Bacteriol. 195, 5540-5554; Richter etal. (2014) EMBO Mol. Med. 6, 1622-1637). Mutations were transferredusing P1 transduction (Miller, Experiments in molecular genetics. ColdSpring Harbor, N.Y.: Cold Spring Harbor Laboratory (1972)).

In order to grow macrocolony biofilms, 5 μl of an overnight cultures inLB medium were spotted on freshly prepared agar plates containing YESCA(Lim et al. (2014) Biochem. Biophys. Res. Comm. 443, 345-350) medium.Plates were supplemented with Congo red (40 μg/ml) and CoomassieBrilliant blue (20 μg/ml). Since cellulose and curli fiber expressionoccurs below 30° C. in E. coli K-12 derivatives, plate and liquidcultures were grown at 26° C. Photography of macrocolonies waspreviously described (Serra et al. (2013) mBio 4(2), e00103-00113).

ECM and Polysaccharide Isolation

Non-isotopically labeled NMR samples of cellulosic materials wereprepared from UTI89ΔcsgA, AR3110ΔcsgBA, AR3110ΔcsgBAΔbcsE,AR3110ΔcsgBAΔbcsF, AR3110ΔcsgBAΔbcsG, AR3110ΔcsgBAΔbcsG/pBcsG, andSalmonella IR715ΔcsgBA grown on YESCA agar supplemented with 25 μg/mLCongo red at 26° C. for 60 hours. Bacterial cells were scraped into a 10mM Tris, pH 7.4 and sheared using an OmniMixer homogenizer for fivecycles of one-minute shear and two-minute rest. Cells were pelleted bycentrifugation at 5,000 g at 4° C. for 10 minutes and washed andpelleted two additional times in the Tris buffer. The resultingsupernatants were spiked with 5M NaCl to achieve a final concentrationof 170 mM NaCl. The ECM or cellulosic material was then pelleted down bycentrifugation at 13,000 g for one hour. The pellets were subjected to4% SDS treatment overnight and subsequently washed to remove all SDS.

Uniformly ¹⁵N-labeled pEtN cellulose was prepared as described above,but from cells grown on a modified version of Neidhardt's MOPS minimalagar medium supplemented with Congo red (McCrate et al. (2013) J. Mol.Biol. 425, 4286-4294). Serine labeled pEtN and ECM samples were preparedfrom cells grown on YESCA agar medium supplemented with 25 μg/mL Congored and either 25 mg/L L-[3-¹³C]Ser or L-[¹⁵N]Ser.

PEtN cellulose was purified without Congo red using a strain engineeredto overexpress diguanylate cyclase to increase the yield of isolatedmaterial. The pEtN cellulose overproducing strain UTI89ΔcsgA/pMMB956 wascreated by transforming UTI89ΔcsgA with pMMB956, a plasmidoverexpressing diguanylate cyclase under IPTG induction. The bacteriawere grown at 26° C. for 60 hours on YESCA agar containing 250 μM ofIPTG. Bacterial cells were harvested into 10 mM Tris, pH 7.4 and shearedusing an OmniMixer homogenizer for five cycles of one-minute shear andtwo-minute rest. Cells were pelleted by centrifugation at 5,000 g for 10minutes and washed and pelleted two additional times in the Tris buffer.The resulting supernatants were dialyzed using 100 kDa dialysis membraneagainst water overnight. The dialyzed solutions were frozen, thawed, andcentrifuged at 10,000 g at 4° C. for 20 minutes to pellet the cellulosicmaterial. The cellulosic materials were subjected to 4% SDS treatmentovernight and then washed to remove all SDS.

Solid-State NMR Experiments

All the ¹³C CPMAS, ¹⁵N CPMAS, ¹³C CP array and C{N} REDOR experimentswere performed using an 89-mm wide-bore Varian magnet at 11.7 T (499.12MHz for ¹H and 125.52 MHz for ¹³C) and a home-built four-frequencytransmission-line probe with a 13.66-mm-long, 6-mm inner diameter samplecoil, and a Revolution NMR MAS Vespel stator. Samples were spun inthin-wall 5 mm outer diameter zirconia rotors (Revolution NMR, LLC) at7143±2 Hz using a Varian MAS control unit. For all NMR experiments, theit-pulse lengths were 7 μs for ¹H and 10 μs for ¹³C and ¹⁵N. The recycledelay was 2 s. The proton-carbon and the proton-nitrogen crosspolarizations occurred at 50 kHz for 1.5 ms unless otherwise noted. The¹³C spectra were referenced to TMS as 0.0 ppm, which was determinedrelative to an adamantine standard at 38.5 ppm. The ¹⁵N spectra werereferenced to liquid ammonia at 0 ppm.

The ³¹P CPMAS and C{P} REDOR spectra were obtained using a 500 MHzspectrometer with an 89 mm-bore Magnex magnet; a six-frequencytransmission-line probe having a 12-mm long, 6-mm inside-diameteranalytical coil housed in a Chemagnetics/Varian magic-angle spinningceramic stator, controlled by a Tecmag pulse programmer; and activefeedback control of high-power amplifiers. ¹H-¹³C cross-polarizationtransfers were made with radiofrequency fields of 62.5 kHz. The it-pulselengths were 8 μs for ¹³C and 9 μs for ³¹P. Proton dipolar decouplingwas 100 kHz with TPPM modulation (Bennett et al. (1995) J. Chem. Phys.103, 6951-6958) during dipolar evolution and data acquisition.

Solution-State NMR Experiments

5 mg/mL of pEtN cellulose was digested in 0.5 M HCl in D₂O at 95° C. for48 hours. Brown precipitates were observed and pelleted down bycentrifugation at 5,000 g for 5 minutes. 700 μL of the supernatant wastransferred to a solution NMR tube and DSS was added in as a chemicalshift reference standard. ¹H NMR spectra of the hydrolyzed pEtNcellulose and control samples of ethanolamine, glucose andglucose-6-phosphate were collected on a 600 MHz Varian NMR spectrometerwith water suppression using a pre-saturation method. ¹³C NMR spectrawere collected on a 500 MHz Varian NMR spectrometer. ¹H-¹H COSY and¹H-¹³C HSQC of the hydrolyzed pEtN cellulose were measured on a 600 MHzVarian NMR spectrometer. The intact pEtN cellulose sample analyzed bysolution-state NMR was prepared by adding D₂O to lyophilized pEtNcellulose and sonicating briefly prior to the ¹H NMR measurement on a600 MHz Varian NMR spectrometer.

Construction of Translational lacZ and phoA Reporters and Enzyme Assays

Plasmids for the expression of translational lacZ or phoA fusions tobcsF and bcsG were constructed using oligonucleotide primers listed inTable 1 and pJL28 (Lucht et al. (1994) J. Biol. Chem. 269, 6578-6586) orpAP28 as a vector. pAP28 is a phoA translational fusion vector based onpJL28 in which part of lacZ was removed via HindIII/EcoRV digestion andthen replaced by the phoA sequence without its signal sequence-encoding5′-part (using oligos FMO-68/-69 listed in Table 1).

β-galactosidase (LacZ) activity was determined usingo-nitrophenyl-β-D-galactopyranoside (ONPG) as a substrate and isreported as μmol of o-nitrophenol per min per mg of cellular protein.Alkaline phosphatase (PhoA) activity was determined usingp-nitrophenyl-phosphate (pNPP) as a substrate and is reported as μmol ofp-nitrophenol per min per mg of cellular protein. All enzyme assays weredone at least in triplicate with average data and standard deviationsshown in the respective figures.

TABLE 1 Oligonucleotide primers used in the present study.I. Primers used generating translational lacZ and phoA fusions:FMO-68 (HindIII) TAAATAAGCTTCGGACACCAGAAATGCC (SEQ ID NO: 1)FMO-69 (EcoRV) TAAATGATATCTTAAGTCTGGTTGCTAACAGC (SEQ ID NO: 2)bcsEFG-BamHI GGCCCGGGATCCCTGGAAGATATAGCCTATCGC (SEQ ID NO: 3) bcsF +3-phoA-HindIII GGCCCGAAGCTTCATGATGAGCGCTCCACAG (SEQ ID NO: 4) bcsF +72-phoA-HindIII GGCCCGAAGCTTCAGATAGCCCAGCGGGAAAA (SEQ ID NO: 5) bcsG +3-phoA-HindIII GGCCCGAAGCTTCATTTTTTGGTTGCCCTGGC (SEQ ID NO: 6) bcsG +474-phoA-HindIII GGCCCGAAGCTTACTTGGTCCCGCCAGGGTA (SEQ ID NO: 7) bcsF +4-lacZ-HindIII GGCCCGAAGCTTTCATGATGAGCGCTCCACAG (SEQ ID NO: 8) bcsF +73-lacZ-HindIII GGCCCGAAGCTTCCAGATAGCCCAGCGGGAA (SEQ ID NO: 9) bcsG +4-lacZ-HindIII GGCCCGAAGCTTTCATTTTTTGGTTGCCCTGGC (SEQ ID NO: 10) bcsG +87-lacZ-HindIII GGCCCGAAGCTTGCCACAAGGAGAAACTTGGTC (SEQ ID NO: 11)

Example 2 Production of Non-Modified Cellulose by E. Coli Lacking thecsgG Gene

The AR3110ΔcsgBAΔbcsG mutants were grown on Congo red-supplemented YESCAagar to qualitatively evaluate cellulose production. The Congo redserves as an indicator of both unmodified cellulose and modifiedcellulose production. Congo red binding by the AR3110ΔcsgBA mutant isattributed only to cellulose as it does not produce curli.AR3110ΔcsgBAΔbcsG exhibited Congo red binding that was comparable toAR3110ΔcsgBA, whereas Congo red binding was abrogated in the bcsE andbcsF derivatives, which influence the amount of production of cellulosicmaterial (FIG. 15). Thus, the AR310ΔcsgBAΔbcsG mutant producedsignificant quantities of unmodified cellulose when the cells were grownon nutrient agar medium. The absence of the normal biofilm phenotype(wrinkling on agar and pellicle formation at the air-liquid interface)is not due to failure of the mutant cells to make cellulosic material(the mutant makes regular cellulose as confirmed by NMR), but rather,the lack of the phosphoethanolamine modification.

Example 3 The Phosphoethanolamine Modification Enhances Solubility inWater

The solubility of purified E. coli phosphoethanolamine cellulose inwater was compared with commercially available crystalline cellulose andcommercially available carboxymethyl cellulose (produced chemically),with the latter known to be soluble in water and highly digestible bycellulases. The zwitterionic phosphoethanolamine cellulose was found tobe semi-soluble in water and significantly more soluble than unmodifiedcellulose (FIG. 16). Its enhanced solubility indicates thatphosphoethanolamine cellulose will be useful in numerous applicationsthat currently utilize other synthetically modified celluloses.

Example 4 The Phosphoethanolamine Cellulose is More Digestible byCellulase

Aspergillus niger cellulase was used for enzymatic hydrolysis ofphosphoethanolamine cellulose, crystalline cellulose, and commerciallyavailable carboxymethyl cellulose. The glucose produced by hydrolysiswas detected with a standard hexokinase assay. As shown in FIG. 17, thephosphoethanolamine cellulose produced by E. coli was more digestible bycellulase than commercial crystalline cellulose. Thus, treatment ofcellulose with the BcsG-encoded phosphoethanolamine transferase renderscellulose more digestible. The transfer of the bcsG gene, by itself orin combination with the bcsE and bcsF genes may enable the production ofphosphoethanolamine cellulose in plants and other organisms. Thephosphoethanolamine modification of cellulose may weaken associationswith lignin or other components and aid in efforts to digest cellulosicmaterial for applications including the conversion of cellulose andcellulosic materials into ethanol.

Although preferred embodiments of the subject invention have beendescribed in some detail, it is understood that obvious variations canbe made without departing from the spirit and the scope of the inventionas defined herein.

1. A cellulose-producing host cell comprising a recombinantpolynucleotide encoding a BcsG phosphoethanolamine transferase operablylinked to a promoter.
 2. The cellulose-producing host cell of claim 1,wherein the recombinant polynucleotide is provided by a plasmid or viralvector.
 3. The cellulose-producing host cell of claim 1, wherein therecombinant nucleic acid is integrated into the host cell genome.
 4. Thecellulose-producing host cell of claim 1, wherein the host cell furthercomprises a recombinant polynucleotide comprising a BcsE gene or BcsFgene operably linked to a promoter.
 5. (canceled)
 6. Thecellulose-producing host cell of claim 1, wherein the recombinantpolynucleotide comprises a multicistronic vector expressing BcsG, BcsE,and BcsF.
 7. (canceled)
 8. The cellulose-producing host cell of claim 1,wherein the recombinant polynucleotide comprises a bcsEFG operon.
 9. Thecellulose-producing host cell of claim 1, wherein the host cell is abacterial cell, a plant cell, or an algae cell.
 10. Thecellulose-producing host cell of claim 1, wherein the bacterial cell isa Gram-negative bacterium.
 11. The cellulose-producing host cell ofclaim 10, wherein the Gram-negative bacterium belongs to a genusselected from the group consisting of Acetobacter, Agrobacterium,Escherichia, and Salmonella.
 12. The cellulose-producing host cell ofclaim 11, wherein the Gram-negative bacterium is selected from the groupconsisting of Acetobacter xylinum, Escherichia coli, and Salmonellaenterica.
 13. The cellulose-producing host cell of claim 1, whereincellulose production is upregulated by cyclic di-GMP.
 14. Thecellulose-producing host cell of claim 1, further comprising arecombinant polynucleotide comprising a promoter operably linked to apolynucleotide encoding diguanylate cyclase. 15-17. (canceled)
 18. Amethod of producing a phosphoethanolamine cellulose, the methodcomprising: a) culturing the cellulose-producing host cell of claim 1under conditions suitable for expression of the BcsG phosphoethanolaminetransferase, wherein the phosphoethanolamine cellulose is produced; andb) isolating the phosphoethanolamine cellulose.
 19. The method of claim18, wherein the temperature is in a range from about 25° C. to about 37°C.
 20. The method of claim 18, wherein said culturing is performed at atemperature below 30° C.
 21. The method of claim 20, wherein thetemperature is in a range from about 26° C. to about 28° C. 22.(canceled)
 23. The method of claim 18, wherein said culturing isperformed in a growth media comprising one or more carbon sourcesselected from the group consisting of glucose, fructose, acetate, orglycerol.
 24. The method of claim 18, further comprising increasingcellulose production by contacting the cellulose-producing host cellwith cyclic di-GMP.
 25. The method of claim 18, wherein cyclic di-GMPlevels in the cell are increased by transfecting the cellulose-producinghost cell with a recombinant polynucleotide comprising a promoteroperably linked to a polynucleotide encoding diguanylate cyclase. 26.The method of claim 25, wherein the promoter is an inducible promoter.27. The method of claim 25, wherein the recombinant polynucleotide isprovided by a vector.
 28. (canceled)
 29. A composition comprising aphosphoethanolamine cellulose ester, wherein at least one hydroxyl groupof the phosphoethanolamine cellulose is esterified.
 30. The compositionof claim 29, wherein said at least one hydroxyl group is esterified withan organic acid, acid anhydride, acid chloride, or inorganic acid. 31.The composition of claim 30, wherein the organic acid is selected fromthe group consisting of acetic acid, propanoic acid, and butyric acid.32. The composition of claim 30, wherein the inorganic acid is selectedfrom the group consisting of nitric acid and sulfuric acid.
 33. Acomposition comprising a phosphoethanolamine cellulose ether, wherein atleast one hydroxyl group of the phosphoethanolamine cellulose isetherified.
 34. The composition of claim 33, wherein thephosphoethanolamine cellulose ether is an alkyl ether, a hydroxyalkylether, or a carboxyalkyl ether.
 35. A composition comprising aphosphoethanolamine cellulose, wherein at least one amine group ismodified.
 36. The composition of claim 35, wherein said at least oneamine group is alkylated, acylated, or sulfonated.
 37. The compositionof claim 35, wherein said at least one amine group is conjugated to anagent.
 38. The composition of claim 37, wherein the agent is a peptide,antibody, enzyme, nucleic acid, dye, ligand, or drug.
 39. A method ofhydrolyzing a phosphoethanolamine cellulose, the method comprisingcontacting the phosphoethanolamine cellulose with one or morecellulases.
 40. The method of claim 39, wherein said one or morecellulases are endocellulases, exocellulases, beta-glucosidases,oxidative cellulases, cellulose phosphorylases, or a combinationthereof.